Formalizing Triggers: A LearningModel for Finite Spaces
نویسنده
چکیده
I n a r ecent seminal paper , Gi bs on and Wexl er ([1], GW) t ake i mpor t ant s t eps t o formal i zi ng t he not i on of l anguage l ear ni ng i n a ( ni t e) s pace whos e gr ammar s ar e char act er i zed by a ni t e number of parameters. One of t he i r ai ms i s t o char act er i ze t he compl exi t y of l ear ni ng i n s uch s paces . For exampl e , t hey demons t r at e t hat even i n ni t e s paces , conver gence may be a pr obl ems i nce i t i s pos s i bl e under s ome s i ngl es t e gr adi ent as cent met hods t o r emai n at a l ocal maxi mum. Fr omt he s t andpoi nt of l ear ni ng t heor y, however , GWl eave open s ever al ques t i ons t hat can be addr es s ed by a mor e pr ec i s e f ormal i zat i on i n t erms of Mar kov s t r uct ur es ( a pos s i bl e f ormal i zat i on s ugges t ed but l e f t unpur s ued i n a f oot not e of GW) . I n t hi s paper we expl i c i t l y f ormal i ze l ear ni ng i n a ni t e par amet er s pace as a Mar kov s t r uct ur e whos e s t at es ar e par amet er s et t i ngs . Sever al i mpor t ant r es ul t s t hat f ol l owdi r ect l y f r omt hi s char act er i zat i on, i nc l ude cor r ect ed ver s i on of GW's cent r al conver gence pr oof ; ( 2) an expl i c i t f ormul a f or cal cul at i ng t he t r ans i t pr obabi l i t i es between hypot hes es and t he exi s t ence of \pr obl ems t at es " i n addi t i on t o l ocal maxi ma; ( 3 an expl i c i t cal cul at i on of t he t i me needed t o conver ge , i n t erms of number of ( pos i t i ve) exampl es ; ( 4 t he conver gence and compar i s on of s ever al var i ant s of t he GWl ear ni ng pr ocedur e , e . g. , r andomwal k; ( 5) bat chand PACs t yl e l ear ni ng bounds f or t he model . Copyright c Massachusetts Insti tute of Technology, 1993 This report describes researchdone within the Center for Biological andComputational Learning in the Department of Brain andCognitive Sciences, and at the Arti cial Intel l igence Laboratory. This research is supported byNSF grant 9217041-ASC andARPAunder the HPCCprogram. Correspondence by e-mai l could be directed to pn@ai .mit.eduor berwick@ai .mit.edu. 1 Introducti on: The Tri ggeri ng Model as a Markov structure Recent l y, Gi bs on and Wexl er ( [ 1] , GW) have begun t o f ormal i ze t he not i on of l anguage l ear ni ng i n a ( ni t e) s pace whos e gr ammar s ( and l anguages ) ar e char act er i zed by a ni t e number of par amet er s or 1di mens i onal Bool eanval ued ar r ays , n l ong. Agrammar i n t hi s s pace i s s i mpl y a par t i cul ar nl engt h ar r ay of 0' s and 1' s ; hence t her e ar e 2 n pos s i bl e gr ammar s ( l anguages ) . One of Gi bs on and Wexl er ' s ai ms i s t o es t abl i s h t hat under s ome s i mpl e hi l l c l i mbi ng l ear ni ng r egi mes , namel y, s i ngl es t ep gr adi ent as cent , s ome l i ngui s t i cal l y nat ur al , ni t e , s paces ar e unl ear nabl e , i n t he s ens e t hat pos i t i veonl y exampl es l ead t o local maxima|i ncor r ect hypot hes es f r omwhi ch a l ear ner can never es cape. Mor e br oadl y, t hey wi s h t o s how t hat l ear nabi l i t y i n s uch s paces i s s t i l l an i nt er es t i ng pr obl em, i n t hat t her e i s a s ubs t ant i ve l ear ni ng t heor y concer ni ng f eas i bi l i t y, conver gence t i me, and t he l i ke , t hat mus t be addr es s ed beyond t r adi t i onal l i ngui s t i c t heor y and t hat mi ght even choos e between ot herwi s e adequat e l i ngui s t i c t heor i es . I n t hi s paper , we choos e as a conveni ent s t ar t i ng poi nt t hei r Tr i gger i ng Lear ni ng Al gor i t hm(TLA) t o f ocus our i nves t i gat i on of par amet er l ear ni ng. Our cent r al r es ul t i s t hat t he per f ormance of t hi s al gor i t hmi s compl et e l y model ed by a Mar kov chai n. The r emai nder of t he cur r ent paper i s devot ed t o expl or i ng t he bas i c cons equences of t hi s f act . Let us r s t r evi ew t he GWmodel and t he TLA. Fol l owi ng Gol d [ 2] t he bas i c f r amewor k i s t hat of i dent i cat i on i n t he l i mi t . The l ear ner ( chi l d) s t ar t s out i n an ar bi t r ar y st at e= some s et t i ng of t he n par amet er val ues . The l ear ner ( chi l d) r ece i ves a ( count abl y i n ni t e) s equence of pos i t i ve exampl e s ent ences dr awn f r oms ome t ar get l anguage, L t. Af t er each pr es ent at i on, t he l ear ner can ei t her ( i ) s t ay i n t he s ame s t at e ; or ( i i ) move t o a new hypot hes i s s t at e , us i ng t he al gor i t hmgi ven bel ow. I f af t er s ome ni t e number of exampl es t he l ear ner conver ges t o t he cor r ect t ar get l anguage (= par amet er s et t i ngs ) and never changes s t at e , t hen i t has cor r ect l y i dent i ed t he t ar get l anguage; ot herwi s e , i t does not conver ge . I n addi t i on, i n t he GWmodel t he l anguage l ear ner obeys two f undament al cons t r ai nt s : ( 1) t he si ngl e-val ue const rai nt|the l ear ner can change onl y 1 par amet er val ue at a t i me; and ( 2) t he greedi ness const rai nt|i f , t he l ear ner i s gi ven a pos i t i ve exampl e i t cannot r ecogni ze ( accept ) , and i f t he l ear ner changes one par amet er val ue and nds t hat i t can accept t he exampl e , t hen t he l ear ner r et ai ns t hat new par amet er val ue . Fi nal l y, we al s o r ecal l GW' s de ni t i on of a l ocal t ri gger (mi nor not at i onal changes as i de) : gi ven val ues f or al l par amet er s but one, a l ocal t ri gger f or val ue v of par amet er s p i, pi( v ) , i s a s ent ence s f r omt he t ar get gr ammar G T s uch t hat s i s gr ammat i cal i p i( v ) =v . GWthen s t at e t hei r TLAas f ol l ows : [ I ni t i al i ze ] St ep 1. St ar t at s ome r andompoi nt i n t he ( ni t e) s pace of pos s i bl e par amet er s et t i ngs , s pec i f yi ng a s i ngl e hypot hes i zed gr ammar wi t h i t s r es ul t i ng ext ens i on as a l anguage; [ Pr oces s i nput s ent ence ] St ep 2. Rece i ve a pos i t i ve exampl e s ent ence s i at t i me t i ( exampl es dr awn f r om t he l anguage of a s i ngl e t ar get gr ammar , L(Gt) ) , f r om a uni f orm di s t r i but i on on t he l anguage (we s hal l be abl e t o r e l ax t hi s di s t r i but i onal cons t r ai nt l at er on) ; [ Lear nabi l i t y on er r or det ect i on] St ep 3. I f t he cur r ent gr ammar par s es ( gener at es ) s i, t hen go t o St ep 2; ot herwi s e , cont i nue. [ Si ngl es t ep gr adi ent as cent ] Se l ect a s i ngl e par amet er at r andom, uni f orml y wi t h pr obabi l i t y 1=n, t o i p f r om i t s cur r ent s et t i ng, and change i t ( 0 mapped t o 1, 1 t o 0) i t hat change al l ows t he current sent ence t o be anal yzed; ot herwi s e go t o St ep 2; Of cour s e , t hi s al gor i t hm never hal t s i n t he us ual s ens e . GWai mt o s howunder what condi t i ons t hi s al gor i t hm conver ges \i n t he l i mi t "|that i s , af t er s ome number , n; of s t eps , wher e n i s unknown, t he cor r ect t ar get par amet er s et t i ngs wi l l be s e l ect ed and never be changed. Thei r cent r al c l ai mi s s t at ed as t hei r Theor em 1 ( p. 7 i n t hei r manus cr i pt ) . 1 Theorem 1 As l ong as t he probabi l i t y i s al ways great er t han a l ower bound b (b > 0) t hat t he l earner wi l l 1) encount er a l ocal t ri gger for some i ncorrect l yset paramet er P , and 2) t hen reset P accordi ngl y t o t he t arget val ue, i t t urns out t hat t he t arget grammar can al ways be l earned usi ng t he Tri ggeri ng Learni ng Al gori t hm. 1.1 The Markov formulation Fr omt he s t andpoi nt of l ear ni ng t heor y, however , GW l eave open s ever al ques t i ons t hat can be addr es s ed by a mor e pr ec i s e f ormal i zat i on of t hi s model i n t erms of Mar kov chai ns ( a pos s i bl e f ormal i zat i on s ugges t ed but l e f t unpur s ued i n f oot not e 9 of GW) . We can pi ct ur e t he hypot hes i s s pace , of s i ze 2 , as a s et of poi nt s , each cor r es pondi ng t o one par t i cul ar ar r ay of par amet er s et t i ngs ( l anguages , gr ammar s ) . Cal l each poi nt a hypot hesi s st at e or s i mpl y st at e of t hi s s pace . As i s convent i onal , w de ne t hes e l anguages over s ome al phabet as a s ubs et of . One of t hemi s t he t ar get l anguage ( gr ammar ) . We ar bi t r ar i l y pl ace t he ( s i ngl e) t ar get gr ammar at t he cent er of t hi s s pace . Si nce by t he TLAt he l ear ner i s r es t r i ct ed t o movi ng at mos t 1 bi nar y val ue i n a s i ngl e s t ep, t he t heor et i cal l y pos s i bl e t r ans i t i ons between s t at es can b dr awn as ( di r ect ed) l i nes connect i ng par amet er ar r ays ( hypot hes es ) t hat di er by at mos t 1 bi nar y di gi t ( a 0 or a 1 i n s ome cor r es pondi ng pos i t i on i n t hei r ar r ays ) . Rec l l t hat t hi s i s t he s ocal l ed Hammi ng di st ance . We may f ur t her pl ace wei ght s on t he t r ans i t i ons f r om s t at e i t o s t at e j cor r es pondi ng t o t he nonzer o b ' s ment i oned i n t he t heor em above; t hes e cor r es pond t o t he pr obabi l i t i es t hat t he l ear ner wi l l move f r om hypot hes i s s t at e i t o s t at e j . I n f act , as we s hal l s how bel ow, gi ven a di s t r i but i on over L(G) , we can f ur t her car r y out t he cal cul at i on of t he act ual b ' s t hems el ves . Thus , we 1Note that the notion of \trigger" does not enter into the statement of the TLAor the constraints the TLAemploys, bu only into the statement of the theorem. 1 can pi ct ur e t he TLA l ear ni ng s pace as a di r ect ed, l abe l ed gr aph V wi t h 2 n ver t i ces . 2 Mor e pr ec i s e l y, we can make t he f ol l owi ng r emar ks about t he TLAs ys t emGW
منابع مشابه
A new approximation method for common fixed points of a finite family of nonexpansive non-self mappings in Banach spaces
In this paper, we introduce a new iterative scheme to approximate a common fixed point for a finite family of nonexpansive non-self mappings. Strong convergence theorems of the proposed iteration in Banach spaces.
متن کاملStrong convergence theorem for finite family of m-accretive operators in Banach spaces
The purpose of this paper is to propose a compositeiterative scheme for approximating a common solution for a finitefamily of m-accretive operators in a strictly convex Banach spacehaving a uniformly Gateaux differentiable norm. As a consequence,the strong convergence of the scheme for a common fixed point ofa finite family of pseudocontractive mappings is also obtained.
متن کاملAn extension theorem for finite positive measures on surfaces of finite dimensional unit balls in Hilbert spaces
A consistency criteria is given for a certain class of finite positive measures on the surfaces of the finite dimensional unit balls in a real separable Hilbert space. It is proved, through a Kolmogorov type existence theorem, that the class induces a unique positive measure on the surface of the unit ball in the Hilbert space. As an application, this will naturally accomplish the work of Kante...
متن کاملCommon fixed points of a finite family of multivalued quasi-nonexpansive mappings in uniformly convex Banach spaces
In this paper, we introduce a one-step iterative scheme for finding a common fixed point of a finite family of multivalued quasi-nonexpansive mappings in a real uniformly convex Banach space. We establish weak and strong convergence theorems of the propose iterative scheme under some appropriate conditions.
متن کاملSome iterative method for finding a common zero of a finite family of accretive operators in Banach spaces
The purpose of this paper is to introduce a new mapping for a finite family of accretive operators and introduce an iterative algorithm for finding a common zero of a finite family of accretive operators in a real reflexive strictly convex Banach space which has a uniformly G^ateaux differentiable norm and admits the duality mapping $j_{varphi}$, where $varphi$ is a gauge function ...
متن کاملFUZZY ORDERED SETS AND DUALITY FOR FINITE FUZZY DISTRIBUTIVE LATTICES
The starting point of this paper is given by Priestley’s papers, where a theory of representation of distributive lattices is presented. The purpose of this paper is to develop a representation theory of fuzzy distributive lattices in the finite case. In this way, some results of Priestley’s papers are extended. In the main theorem, we show that the category of finite fuzzy Priestley space...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001